Multi-modal Interview Concept Detection for Rushes Exploitation
نویسندگان
چکیده
According to the concepts of Large-Scale Concept Ontology for Multimedia (LSCOM) and requirement of the 4th task in the 2006 TRECVID, i.e., rushes exploitation, the “interview” concept is an important semantic concept for rushes content analysis. The paper presents the shot-level “interview” concept detection method. Face detection and audio classification are implemented to detect “face” and “speech” concepts for each shot. By integrating audiovisual information, “interview” concept is finally detected. The utilization of the method will definitely benefit the video edit. Large-scale experimental results strongly demonstrate the accuracy and effectiveness of the proposed method. 1.Introduction The TREC conference series is sponsored by the National Institute of Standards and Technology (NIST) with additional support from other U.S. government agencies. The goal of the conference series is to encourage research in information retrieval (Guidelines, 2006). In the 2006 TRECVID, there are three system tasks and one exploratory task: shot boundary determination, high-level feature extraction, search and rushes exploitation. In broadcasting and filmmaking industries, “rushes” is a term for raw footage, which is used for productions such as TV programs and movies. Usually up to 40 hours of raw footage is converted into one hour of TV program (P. Allen, 2005). In rushes, there are a lot of static scenes, redundant episodes and out of focus fragments. Rushes’ soundtracks can be noisy and indecipherable for automatic speech recognition. Moreover, there is no caption and textual information is rather sparsely available for rushes content analysis (P. Allen, 2005). Due to the characteristics of rushes, the content analysis on it is different form current work on edited video, for example, movies, news video and sports video. Consequently, it is a challenging and promising work to develop novel data mining techniques for rushes. In the 2006 TRECVID, about 50 hours of rushes is provided by the BBC Archive for rushes exploitation. The main content of them are interview scene, person activity scene, natural scene and some redundant shots. In our report for Rushes exploitation in TRECVID 06 (Tang, 2006), we have generally presented our work for the items mentioned above. Obviously, the interview scene is the most useful part for news program production. Compared to the previous work only focusing on the specific person identification and domain knowledge based video content indexing on the edited news video (Kuo, 2005 ; Albiol, 2003), the interview concept detection aims to extract integrated semantic episodes for video edit on the raw material. Therefore, the paper detailedly presents a shot-level “interview” concept detection method. The rest of the paper is organized as follows: Section 2 specifically illustrated the shot-level “interview” detection method. Section 3 provides the experiment results and detailed analysis. In section 4, concluding remarks and future advanced work are presented. Conference RIAO2007, Pittsburgh PA, U.S.A. May 30-June 1, 2007 Copyright C.I.D. Paris, France 2.Shot-level interview concept detection According to the 330th concept of LSCOM in (LSCOM ; Naphade, 2006), interview shots mainly mean those shot on the special location out of the studio. Generally speaking, interview shots can be classified into two kinds, monologue as shown in Fig.1 (a) and dialogue as shown in Fig.1 (b).
منابع مشابه
TRECVID 2006 Rushes Exploitation By CAS MCG *
In our rushes exploitation task of TRECVID 2006, we propose a novel and interactive rushes video selection and editing method based on hierarchical browsing of key frames, where high level features of each key frame such as face, interview, person, crowd, building, outdoor, waterbody, and other information about redundancy and repetition are displayed at same time for helping editors to select ...
متن کاملRushes Exploitation 2006 By CAS MCG*
In our rushes exploitation task of TRECVID 2006, we propose a novel and interactive rushes video selection and editing method based on hierarchical browsing of key frames, where high level features of each key frame such as face, interview, person, crowd, building, outdoor, waterbody, and other information about redundancy and repetition are displayed at same time for helping editors to select ...
متن کاملDamage detection of multi-girder bridge superstructure based on the modal strain approaches
The research described in this paper focuses on the application of modal strain techniques on a multi-girder bridge superstructure with the objectives of identifying the presence of damage and detecting false damage diagnosis for such structures. The case study is a one-third scale model of a slab-on-girder composite bridge superstructure, comprised of a steel-free concrete deck with FRP rebars...
متن کاملEurécom at TRECVid 2006: Extraction of High-level Features and BBC Rushes Exploitation
For the four year we have participated to the high-level feature extraction task and we pursued our effort on the fusion of classifier outputs. Unfortunatly a single run was submitted for evaluation this year, due to lack of computationnal ressources during the limited time available for training and tuning the entire system. This year’s run is based on a SVM classification scheme. Localised co...
متن کاملVideo Content Browsing Based on Iterative Feature Clustering for Rushes Exploitation
Rushes Exploitation We have implemented a feature-independent framework for video content browsing, which is based on iterative clustering and filtering of the content set. Plug-ins for content clustering using the features camera motion, motion activity, audio volume, face occurrences, global color similarity and object similarity have been implemented. A light table view is used for visualiza...
متن کامل